A Rational Approach to Ring Flexibility in Internal Coordinate Dynamics 
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Internal coordinate molecular dynamics (ICMD) is an efficient method for studying biopolymers, 
but it is readily applicable only to molecules with tree topologies, that is with no internal flexible 
rings. Common examples violating this condition are prolines and loops closed by S-S bridges in 
proteins. The most important such case, however, is nucleic acids because the flexibility of the 
furanose rings always plays an essential role in conformational transitions both in DNA and RNA. 
There are a few long-known theoretical approaches to this problem, but, in practice, rings with 
fixed bond lengths are closed by adding appropriate harmonic distance restraints, which is not 
always acceptable especially in dynamics. This paper tries to overcome this handicap of ICMD 
and proposes a rational strategy which results in practical numerical algorithms. It gives a unified 
analytical treatment which shows that this problem is very close to the difficulties encountered 
by the method of constraints in Cartesian coordinate dynamics, and certain ideas of the latter 
appear helpful in the context of ICMD. The method is affordable for large molecules and generally 
applicable to all kinds of rings. A specific implementation for five-membered rings is described and 
tested for a proline-rich polypeptide and a decamer DNA duplex. In both cases conditions are found 
which make possible time steps around 10 fsec in ICMD calculations. 



I. INTRODUCTION 

Internal coordinate molecular dynamicsEH3'tM (ICMD, 
see Ref. || for a historical review) is a recent approach 
in the simulation of flexible polymers which, unlike the 
traditional one, employs torsions and, if desired, valence 
angles and bond lengths as generalized coordinates in 
the equations of motion. It originates from the Euler- 
Lagrange-Hamilton formalism of classical mechanics and 
makes possible modeling of polymers as chains of rigid 
bodies, which automatically eliminates the most severe 
time step limitations characteristic for Newtonian MD. In 
addition, it drastically reduces the configurational space 
of flexible molecules, which is very useful for confor- 
mational searchesd'EI, and for refinement of experimental 
structures by simulated annealingp'Q. 

Treatment of flexible cycles is an inherently difficult 
task for ICMD. The possibility of applying this method 
to large polymers rests upon recursive algorithms which 
all are applicable only wh<^Jie_molecule is topologically 
isomorphous to a treeB&HEJiij. Any non-rigid cycle, 
therefore, makes the whole system unsuitable for ICMD. 
Although a few possible, approaches to this problem can 
be readily sketched,E£rli3 until now, no practical solution 
has been reported. This difficulty is most critical for 
simulations of nucleic acids because the five-membered 
sugar rings connect the bases with the sugar-phosphate 
backbone, and their relative orientations are, therefore, 
determined by the pseudorotation states of the sugars. 
As a result, nucleic acids are completely beyond the scope 
of ICMD although one. cannot say that the problem did 
not attract attention. Ej 

This paper describes a rational solution of the prob- 
lem of ring flexibility in ICMD. The new approach has 



much in common with the recent methods of constraints 
in Newtonian dynamicsJIa The similarity between the dif- 
ficulties created by ring flexibility in ICMD, and bond 
length constraints in Newtonian MD, is intuitively clear. 
Both these problems can be consistently treated by us- 
ing projection operators in linear spaces, which results 
in a unified formalism. A fruitful idea used, sometimes 
implicitly, in Newtonian constraint dynamics is that pro- 
jection operators can be applied directly in the finite- 
difference form of the equations of motion. This simpli- 
fies computations because certain terms in the analytical 
equations can be omitted, and immediately results in a 
time-reversible symplectic numerical integrator. 

A specific implementation for the most practically im- 
portant case of five-membered rings is described and 
tested here. It epjjlays our earlier analytical approach 
to ring flexibilityJlaO but can also be adapted to alter- 
native methods, for instance, with various pseudorota- 
tion equations. Numerical algorithms suitable for other 
specific cases, such as S-S bridges in proteins, are also 
tested, but such cases are not studied here in detail. For 
five-membered rings in proteins and nucleic acids specific 
conditions that make possible time steps around 10 fsec 
are considered. 



II. RESULTS AND DISCUSSION 

Overview of Earlier Approaches 

In this section we outline the general view of the prob- 
lem of ring flexibility and try to divide it into distinct 
separable tasks. This problem has been first addressed by 
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FIG. 1. Schematic representation of a loop of rigid bodies. 
To impose a tree topology loop (a) must be considered as 
broken, which can be done in many ways, for example, as in 
loops (b) or (c). Rigid bodies shown as open circles are joined 
by hard sticks in pivots shown as closed circles. 



Go and Scheragalla who formulated the following general 
approach later followed by others. Consider the example 
in Figj^. Suppose we want to describe a flexible loop (a) 
with certain internal coordinates that determine variable 
angles at pivots and lengths of bonds. A correct def- 
inition of internal coordinates always requires that the 
molecule is represented as a regular tree, which means 
that, conceptually, loop (a) in Fig.[l] must be considered 
broken in agreement with a certain tree topology, while a 
correct loop closure is maintained by applying appropri- 
ate constraining conditions. Hereafter this construction 
is referred to as the underlying tree. The constrained 
underlying tree gives the desired model with correctly 
closed loops, but, in derivations, its unconstrained move- 
ments are also sometimes considered. Structures (b) and 
(c) in Fig. l] demonstrate that there are numerous ways 
to define the underlying tree of any given loop. 

Now suppose we need to sample from the ensemble of 
closed loop conformations. Internal coordinates can be 
denoted as an n-vector q and they should vary concert- 
edly so that the k scalar constraining conditions 



C„(q)=0, n=l,. 



(1) 



are not violated. By inverting Eqs. ([!]), a set of k internal 
coordinates can be, in principle, computed from the rest, 
which may be written as 



q d (q f ) 



(2) 



where q d and q f are the k-vector of dependent and (n- 
k)-vector of independent variables, respectively. Now q f 
can be freely varied within the range of solubility of Eqs. 
(|j). Dependant coordinates q d computed by Eq. (g) 
always provide a correct loop closure in the underlying 
tree. Time derivatives of Eq. (0) read 



dq d 
dq 1 
dq d . 
dq { 



1 =^f c l' 



. f d dg d 
q dt dq f 



(3a) 
(3b) 



where dq d /dq f is a k x (n — k) matrix. This approach 
may be called consistent because, following to the basic 
idea of internal coordinates, it reduces the number of in- 
dependent variables to the true number of conformational 
degrees of freedom. 

The above reasoning reveals the first task to be con- 
sidered, that is, inversion of Eqs. (Q) or, in other words, 
construction of a correctly closed ring geometry for any 
given q f . This is generally difficult because Eqs. (Q) are 
non-linear. Practical solutions exist only in a few special 
cases, notably, there are relatively simplo-alg<p|thms for 
loop closure with flexible valence angles JldOtLj For the 
important case of five-membered rings, pseiidorotation 
equations give the most economical solutionEJ The sec- 
ond task is the calculation of dependent velocities and 
accelerations, but usually this presents no serious diffi- 
culties. 

The possibility to apply this general strategy in ICMD 
was first addressed in Ref. [l3| Explicit equations of mo- 
tion in independent variables can be obtained by linear 
transformations of the familiar ICMD equations for the 
unconstrained underlying tree. It is useful to reproduce 
this result here in a compact form. Let us construct 
n x (n — k) matrix P as 

P dq 

dq f ' 

Every column in P is composed of partial derivatives of 
all internal coordinates with respect to a certain indepen- 
dent variable. Thus, each column has one unit element, 
n — k — I zeroes and k derivatives of dependent variables. 
Any vector q f yields velocities of the underlying tree 

q = Pq f 

that fulfill the loop closure conditions. Similarly, the con- 
strained accelerations are given by 



q = Pcf + Pq* 



(4) 



Consider the unconstrained motion of the underlying 
tree. The corresponding equations of motion can be writ- 
ten asEl 



M(q)q = f(q)+u(q,q), 



(5) 



where M(q), f(q) and u(q, q) are the mass matrix, the 
vector of generalized forces and the inertial term, respec- 
tively. In Rcf. [l3] equations for independent variables in 
a correctly constrained underlying tree were obtained by 
summing up scalar lines in Eq. (pf) corresponding to de- 
pendent variables with coefficients from matrix P, and 
excluding q d with the help of Eq. (||b). These calcu- 
lations may be equivalently expressed in a matrix form 
as 



"MPq f = P* (f + u - MPq f ) 



(6) 
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where the asterisk denotes transposition. All accelera- 
tions for the constrained underlying tree are computed 
by substituting q f into Eq.(^). For the following discus- 
sion it is convenient to rewrite it as 



ivr 1 (f + u) 



P(P*MP) _1 P*M 



I P (P*MP) P*M Pq 



(7) 



Equations (JsJ) and (0) clearly show that the perfor- 
mance of any numerical algorithm originating from the 
consistent approach is limited by the necessity of invert- 
ing the (n — k) x (n — k) mass matrix P*MP. This 
no longer corresponds tOj-aJa-ee topology therefore the 
fast recursive algorithmailJ'El are inapplicable and, at 
least at present, only straightforward inversion is pos- 
sible, which needs O [(n — fc) 3 ] computations. In prac- 
tice a reasonable maximum number of variables for cal- 
culations with a straightforward mass matrix inversion 
is around 100. This approach, therefore, is hardly prac- 
tical for S-S bridges and prolines in globular proteins. 
A minimalist representation of nucleic acids needs about 
eight degrees of freedom per nucleotide, which comprises 
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two pseudorotation variables for a furanose ringOtU, one 
torsion for the base and all backbone torsions. Thus, 
a decamer duplex would be quite manageable since ma- 
trices of separate strands are inverted separately, but a 
hairpin of the same lengths would be more difficult. 

This difficulty could be avoided, in principle, if we did 
not divide internal coordinates into dependent and inde- 
pendent variables. For instance, we could compute all 
reaction forces in a closed loop and include them explic- 
itly into equations of motion of the unconstrained tree, 
which should give correct closed loop dynamics. This 
idea has been in fact elaborated in robot mechanicstj 
and, at least theoretically, it should result in an O(n) al- 
gorithm. Note also that this would be equivalent to the 
introduction of holonomic constraints explicitly, similarly 
to the well-known .method of constraints in Cartesian co- 
ordinate dynamicsEd. In a certain sense such an approach 
may be called inconsistent because the resultant system 
is constrained both implicitly and explicitly, but we will 
see below that this appears more efficient. 



Projection Operators 

In this section we briefly recall a few necessary facts 
concerning projection operators in linear spaces. Con- 
sider a mechanical system of N free particles. Its config- 
uration is determined by an n-vector of Cartesian coor- 
dinates x, where n = 3N. Forces applied to particles are 
given by an n-vector f . The possible values of f cover a 
full n-dimensional vector space C n . 

Now let us assume that our particles are bound to move 
along some hypersurface defined by a constraint equation 

C(x) = 



and that initially they are placed at point Xo with zero 
velocities. Vector 



g = VC 



x 



where V is a multidimensional gradient operator, is or- 
thogonal to the constraint hypersurface and all its or- 
thogonal vectors form an (n — l)-dimensional subspace 
in C n called tangent hyperplane •p™ -1 . The particles are 
held on the constraint hypersurface by reaction forces 
given by vector collinear to g. Together with f this 
vector must give f " E P" -1 : 



fll 



f + f J 



f + a(f)g 



(8) 



where a(f ) is a scalar function. 

It is clear that Eq. (||) applies to any vector f G C n . It 
is known also that the sum of any two free forces fi + f2 
gives the corresponding sum f]' +f| , and that f multiplied 
by a constant results in the same multiplication of f " . In 
other words, Eq. (||) in fact defines a linear mapping 
f — > fH from C n to "P™ -1 , and, consequently, it can be 
expressed as 



Tf 



(9) 



where T is an n x n matrix. 

By construction, T 2 = T, that is T represents a pro- 
jection operator. Any such matrix can be calculated as 



(I-T) = (I-D(G*D)" 1 G*) 



(10) 



where I is the unit matrix, D and G are basis matrices 
of the right and left zero spaces of T, respectively. By 
definition, vector e belongs to the right zero space of T 
if Te = 0. Such vectors form a linear subspace T> k in 
£™ and n x k matrix D in Eq. ( JlCj ) consists of k its 
basis vectors. It is readily verified that TD = 0. Matrix 
G is constructed in the same way, and it is seen that 
G*T = 0. The two zero spaces V k and Q k always have 
the same dimension and they define a unique projection 
operator. 

The physical meaning of T> k and Q k is clear from the 
above example. T> k determines the direction of projec- 
tion. Equation (^|) shows that, in our example, the corre- 
sponding basis consists of a single vector g. In turn, Q k 
is the orthogonal compliment to the tangent hyperplane 
•pn-fc j n Qur exam pi e gi ^ i s identical to V 1 , and so we 

have 



T = l-g(g*g)-V 
Finally, consider 

T = D(G*D)~ 1 G*. 



(11) 



It is readily verified that T is also a projecting operator. 
It projects upon V k along the orthogonal subspace of Q k , 
that is, along the tangent hyperplane , p n ~ k ] thus making 
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a complimentary pair with T. Equations (|10|) and flll| ) 
can also be applied to a single projection, which gives 
two alternative representations. In both cases, the op- 
erator is denned by the direction and the hyperplane of 
the projection. However, in case of Eq. ( |l0| ) one has to 
substitute the direction itself and the orthogonal com- 
pliment for the plane, whereas in Eq. ([y]) the opposite 
combination is required. 



The Analytical Equations of Constraint Dynamics 

In this section we derive the basic equations of motion 
for constraint dynamics directly from the above projec- 
tion considerations. These equations are well-known, but 
our main interest is in the reasoning rather than in the 
result because similar arguments will later lead us to the 
required numerical algorithm. 

We consider again a system with n degrees of freedom 
described by an n-vector q and the equations of motion 
(||) . We assume that these equations can be somehow de- 
rived and solved without major difficulties. For Newto- 
nian MD these are equations for unconstrained molecules 
while in the case of ICMD these are equations for trees 
of rigid bodies. These two models thus play similar roles 
in our reasonings. 

Now let us impose on our system a certain number of 
explicit constraints defined by Eq. (Q). For clarity, from 
now on we drop superscripts denoting dimensions of sub- 
spaces. Equations (^) define an (n-k)-dimensional hyper- 
surface in C and the system is held on it by generalized 
reactions f £ Q where Q is the subspace of orthogonal 
vectors 

g„ = VC M (q) 

with an n x k basis matrix G. A straightforward deriva- 
tion of equations of motion for the constrained system 
would require evaluation of f with subsequent substitu- 
tion into the r.h.s. of Eq. (H): 



Mq = f + u + f J 



(12) 



Calculation of reactions can be bypassed as follows. 

By taking time derivatives of Eq. ([j]) one obtains con- 
straining conditions upon the tree velocities and acceler- 
ations 



g M q + s^q = 



(13a) 
(13b) 



Let us first consider the case g M = 0. One can assume, 
for instance, that particles are moving along space-fixed 
surfaces. In this case the second term in Eq. (13 b) is 
zero and we see that the constrained accelerations belong 
to the tangent hyperplane V. Note also that accord- 
ing to Eq. ( |l2| ) constrained accelerations are obtained 
by correcting the corresponding unconstrained vector by 



M _1 f- L , that is, by a vector from a subspace with the 
basis matrix M _1 G. We see, therefore, that calculation 
of the constrained accelerations is nothing but a projec- 
tion C — ► V along direction M _1 G. The corresponding 
operator is readily computed according to Eq. (M) 



T = (I-T)= I-M- 1 G(G*M- 1 G) 1 G" 



(14) 



By applying it in Eq. (|1J) we obtain equations of motion 
q=TM _1 (f + u) = (15) 



G* 



M _1 (f + u). 



Now consider the case g M ^ 0. Equation (J13|b) in- 
dicates that, unlike velocities, the constrained accelera- 
tions no longer belong to V, but lie in another hyper- 
plane which is shifted from the zero of the coordinates. 
It does not, therefore, represent a subspace in C and the 
constrained accelerations can no longer be obtained by a 
linear mapping like Eq.(j|). Let us, however, decompose 
vector q as 

q = q ll +q ± (16) 

where q" £ V and q^ £ Q. By substituting Eq. jl^ ) into 
Eq.© we get 



Mq 11 = f + u - Mq 1 + f J 



(17) 



Now we can again use operator T to compute q" and, by 
substituting back to Eq. ([lq), obtain 



q = TM 1 (f + u) 



(18) 



Thus, we still avoid explicit calculation of reactions if 
vector q^ is obtained separately. This is, fortunately, 
the case. By definition, q^ is a projection of q upon Q 
along hyperplane V . The corresponding operator can be 
computed by Eq. (pi]). (Note that it differs from T in 
Eq. (|l4| ) by its target hyperplane.) By using Eqs. (|TT| ) 
and (llqb) we get 



q 1 - = G (G*G) 1 G*q = G (G*G) 1 G*q. 



(19) 



and substitution of Eqs. (|LJ) and ( |19| ) into Eq. fll8| ) gives 
the required equation of motion 



q = 

I - M _1 G (*G*M~ 



L G) G" 



M- 1 G(G*M- 1 G) G*q 



M- X (f - 



(20) 



Note that for Cartesian coordinates this gives the same 
equation as that derived by using Lagrange multipliersJlj 
On the other hand, it is seen from Eqs. (|]) and ( |l6| ) that 

q x = Pq f 

and thus Eq. (^|) is equivalent to Eqs. ([l8|) and fl2C|). 
They involve the same projection operation, but employ 
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two alternative representations of the operator. These 
two representations correspond to two alternative ap- 
proaches to constraints in dynamics and dictate opposite 
trends in computational strategies. In case of Eq. (0) the 
projection operator is specified by the target plane and 
the orthogonal compliment to the projecting direction, 
that is according to Eq. (|ll|). This leads to inversion of 
a (n — k) x (n — k) matrix and prompts a reduction of 
the degrees of freedom in the system. In contrast, in the 
case of Eq. (|o|) the operator is specified by the direction 
and the orthogonal compliment to the target plane, that 
is corresponding to Eq. ([To|), which results in inversion 
of a k x k matrix and prompts reduction of the number 
of constraints and, consequently, keeping a possibly large 
number of degrees of freedom. In the second case, matrix 
M _1 is used several times, but this is not computation- 
ally limiting and in practice Eq. ( p(i|) appears applicable 
to large systems. 

Let us finally obtain quasi-Hamiltonian equations for 
the same system. For the unconstrained underlying tree 
these equations can be expressed aso 



P 

q 



f(q) + w(q,q) 



(21a) 
(21b) 



where p is the n-vector of conjugate momenta and 
w(q, q) is the corresponding inertial term. In order to 
get the equations for the constrained tree we just need to 
evaluate generalized reactions from Eqs. (U2) and (20) 
and add them to the r.h.s. of Eq. 
reaction is 



lk). The vector of 



f- 1 - = -G (G*M -1 G) 1 [G*q + G*M- 1 (f + u)l , (22) 



and the resultant equations read 



GfG*M _1 G) 1 G*M" 1 



G ^M^G) 
M V 



G*q 



G*M _1 



u 



(23a) 
(23b) 



The last equations are preferred because they are suit- 
able for symplcctic numerical integration and provide 
much better average properties of dynamic trajectories 
with large time steps. 



Numerical Algorithm 



Equations (|23|) have the same general form as Eq. (|2l| ) 
and are suitable for_jbhe implicit leapfrog integrator de- 
signed for the latter. □ However, here I prefer to follow the 
well-tested approach of the Newtonian MD, which avoids 
a straightforward integration of the analytical equations, 
and uses projection considerations to derive a high pre- 
cision numerical algorithm.E3o The main idea becomes 
clear from the following example. Consider an Euler step 
in Cartesian coordinates: 



vi 

Xl 



v 
x 



M" 1 



fo 



(24a) 
(24b) 



where h is the step size and subscripts refer to the 
time step number. Suppose coordinates xo and veloci- 
ties Vo satisfy constraints with absolute accuracy. Equa- 
tion ( 24 a) with an exact reaction computed by using 
Eq. (22), for instance, would give velocities vi with an 
approximation error of 0(h 2 ). A similar error would be 
propagated to the constraint conditions at the next step. 
Note, however, that, according to Eq. (p4|b), we can 
minimize deviation of the trajectory from the next step 
constraint hyperplane V\ by requiring that vi G Vq. Ef- 
fectively, this means that, instead of the true reaction 
force f^, we substitute into Eq. (pija) another vector 
which provides a projection of vo + M _1 fo/i upon Vq. 
Algorithm (E4), therefore, becomes 



vi = To (v + M-%h) 
xi = x + vih 



(25a) 
(25b) 



where projection operator To is conmuted from xg.. 
This basic idea is used in the SHAKEE3 and LINKSO 
algorithms of Cartesian MD where unconstrained bond 
lengths are corrected by using previous bond directions. 
The projection in Eq. (|25|a) does not eliminate the error 
in the constraints, but only gives the best first approxi- 
mation. The residual error can be controlled by feedback 
schemes which add a small out-of-plane correction to-the 
projected vector vi so as to eliminate accumulation 113 

Now consider the implementation of the above strategy 
for Eqs. (||^). With our present notation the kernel of 
the implicit leapfrog integrator for Eq. (|2l]) readscl 

(26a) 
(26b) 





= f(q„) 


° ^n+l 


= q„_i + (q„ 


° P™+* 


= Pn-| +?nh 




= M"i ! p„, 1 


qn+i 


= q« +q n+ $h 



■1 + W r, 



\ (26c) 
(26d) 
(26e) 



where the conventional notation is used for denoting on- 
step and half-step values. The lines m arke d by c ircle s 
are iterated until convergence of Eqs. ( |26b| ) and (26c). 
Reactions depend upon both coordinates and velocities, 
therefore, if co mpu ted explicitly, they should have been 
added to Eq. ( |26cj ) to give 



P„ + i = P n _i +fnh 



h 
2 



+ 



Following the above strategy we require that f 1 t pro- 

n 1 2 

vide a projection of the unconstrained predicted veloci- 
ties upon V n+ i, which results in the following sequence 
of calculations 
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fn = f (q„) (27a) 

° q«+i = q„-| + (q«-i + q n +±) ^ (27b) 

° P«+I =Pn-| +f n /i + 

(w n _i+w n+i )^+f„ x _^ (27c) 

o q n+ i = T„ + iM;|,p„ + i (27d) 

P n+ i = M„ + iq„ + i (27e) 

^_i|=P„+i-p„+i (27f) 

q„+i =q„ + q n+ i/i (27g) 



Equations ( p7| ) represent the kernel of the new algorithm. 
The prediction step necessary to enter the iterative cycle 
of Eqs. (p7|b-p7|d) is made by using the previous half- 
step values instead of q n +i and w n+ i in Eqs. (§7}d) 
and ([27(3) . It is clear, however, that, if not additionally 
controlled, the ideal ring geometry specified by Eqs. (|l|) 
would deg rade because of the approximation errors in 
Eqs. (p7p. There are numerous case-specific ways to 
handle this problem and some of them are considered 
below. 



Implementation for Five-Membered Rings 

In this section we consider an implementation of the 
above algorithm with examples of specific solutions of 
the remaining practical difficulties, such as the construc- 
tion of a correctly closed loop conformation and calcula- 
tion of the projection operator. Our present implemen- 
tation is specifically suited for five-membered rings and 
has been tested for proline-rich polypeptides and nucleic 
acids. Cystine bridges in proteins present a somewhat 
different case and should better be treated separately. 

Consider once more Fig.yjj-Jt is known since the first 
analysis by Go and ScheragaEj that the problem of loop 
closure in internal coordinates is generally reduced to six 
coupled equations. Loop (b) in Fig. [TJ shows where this 
number comes from. If the loop is broken as shown, each 
half of the broken rigid body has six degrees of freedom. 
In order to close the loop, six rigid body coordinates of 
the two parts must be equated. Structure (c) in the same 
figure shows, however, that the complexity of the prob- 
lem can sometimes be reduced just by choosing a different 
underlying tree. We see that, if all angles are variable, 
closure in loop (c) needs only one distance constraint. 
The number of constraining conditions is always equal 
to the number of degrees of freedom taken from the un- 
derlying tree by the loop closure. All three loops in Fig. 
[TJ are similar, but the underlying tree in (b) has five de- 
grees of freedom more than in (c). For our algorithm 
construction (c) is certainly preferable. 

Let us now turn to five-membered rings. It is known 
that their internal motions are well described as pseu- 
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FIG. 2. The underlying tree of a five-membered ring. 
Atoms are numbered 1,...,5 corresponding to the natural tree 
ordering. All bond lengths are fixed. Shown by arrows are 
five internal coordinates of the underlying tree that determine 
the ring conformation. 

dorotation with only two parameters@@. Pseudorota- 
tion equations can give constraints Eqs. (0) in an explic- 
itly inverted form, which simplifies calculations. How- 
ever, such small rings are flexible only if valence angles 
vary and, therefore, any underlying tree has rather many 
degrees of freedom. Thus, the number of scalar con- 
straints effectively introduced by the pseudorotation ap- 
proach is very large even compared with loop (c) in Fig.[P. 
On the other hand, as we just have seen, this number, 
in principle, can be reduced to one per ring, and this 
appears rather easy. 

Figure || shows the underlying tree for a five- 
membered ring. Atoms are numbered 1,..,5, with 
this ordering in ribose or deoxyribose corresponding to 
C 4 , C' 3 , C' 2) Ci, O^ and the broken bond C^...O^. The 
ring conformation is determined by five valence and di- 
hedral angles qi, (75 indicated by arrows. Let ri, r 5 
denote atom position vectors and Uj ande^, i,j = 1,...,5 
denote the interatomic distances and the corresponding 
unit vectors. Directions and positions of rotation axes of 
ring variables are specified by the unit vectors ei, ...,es 
and position vectors r mi , ...,r m5 , respectively. The con- 
straint condition is 

C=|r 6 -n|-IiB=0 (28) 

We consider firstly calculation of the projection operator 
T. Each ring contributes to the basis matrix G a single 
vector g with only five non-zero components obtained by 
straightforward differentiation of Eq. (^8|) : 

dC 

g% = = ei 5 • ei x (r 5 - r m J . (29) 

These computations are sufficient to evaluate T. In prac- 
tice it is used in Eq. (|27|d) in the factorized form of 
Eq. (|l4[), which results in several matrix- vector multi- 
plications. Only the term (G*M _1 G) 1 needs to be 







computed separately, and the cost of these computations 
appears minor. The product k x k matrix is small and 
essentially diagonal because constraints in different rings 
are only weakly coupled. In addition, I have found that 
this term converges faster than the overall iterative cycle, 
Eqs. (|27|b-d), and there is no need to recalculate it after 
the first two iterations. 

The small deviations of the constrained bond lengths 
ii5 caused by the approximation errors papJie. either elim- 
inated by exact analytical ring closureEjO'EJ or reduced 
to a low and stable level by feedback corrections £3 The 
analytical closure is simple. Let us take variables q±, (74 
as independent and compute the last valence angle 95 so 
that Eq. ( p8| ) is fulfilled. Variables q\,...,qi specify po- 
sitions of atoms 1,...,4 and the orientation of the plane 
of (75 specified by vectors 634 and an in-plane unit vector 
6345 orthogonal to 634. We may write 



= Tq, 1+ i = M X G (G*M _1 G) 1 G*q^ 



(34) 



i"45 = ^45 {xe 3i + ye 345 ) 



(30) 



where x and y are the two unknown in-plane coordinates 
of the unit vector 645 They are found from the constraint 
Eq. ( p8| ) and the normalization condition, which results 
in 



x (ei4e 34 ) + y (ei 4 e 3 4 5 ) 



;2 _ 72 

'15 '45 



I 2 



14' 45 



y 2 = i 



(31a) 
(31b) 



This system is reduced to a square equation and gives a 
single x > solution, which solves the task. Derivatives 
of the dependent angle 55 , which are normally used only 
for computing energy gradients during minimization, are 

The feedback algorithm is constructed as follows. Sup- 
pose, at the nth step, we have a non-zero value C„ in Eq. 
). We require that 



dC 



(32) 



which means that at the next time step the accumulated 
error must be zeroed to the first order. It is clear that 
for each ring only the component of q n+ 1 orthogonal to 
the constraint hypersurface matters and from Eq. (32) 
it is evaluated as 



HP* 



(33) 



Vectors g are mutually orthogonal and by combining Eqs. 
( p3| ) for all rings we obtain a corresponding component 
(pT ! for the whole molecule. To compute the correcting 

velocity q c 1 we require that it result from a variation 

n+ 2 

of reactions and thus belongs to the subspace with the 
basis matrix M _1 G. This gives a projection 



The last computation tends to reduce the extra mechan- 
ical work introduced by the algorithm. The correction 
obtained may be used inside the iterative cycle of the 
algorithm ( p7j ) or just added to q n+ i at the end. Also, 

q^ , 1 can be computed for on-step or for half-step confor- 

n+ 2 

mations, or both such vectors may be combined. These 
alternatives give a series of slightly different algorithms 
which are compared in the numerical tests below. Note, 
however, that both the analytical closure and the feed- 
back schemes break the time reversibility of algorithm 
( p7| ) and should generally increase the drift of the total 
energy. 



Numerical Examples 

In the numerical tests presented below we address two 
issues. First, we test the accuracy of the ring closure pro- 
vided by the new algorithm with and without additional 
corrections. Second, we check its stability with elevated 
time steps, notably, the possibility of step sizes around 10 
fsec for proteins and nucleic acids. This specific value is 
targeted because it has been found optimal, in a certain 
sense, for in-water simulations of proteinscJ. 

We consider two test examples: one for proteins and 
one for nucleic acids. The first is a 36 residue frag- 
ment of ,a,collagen triple helix which, involves 24 prolines 
(file lbfiEJ in the protein database.c3) Parameters-fpr the 
amino acids were taken from the AMBER94 setH Ex- 
cept for prolines, all bond lengths and bond angles were 
fixed at standard values according to the standard geome- 
try approximation. Ej The second test system is a decamer 
DNA duplex (TA)* . Nucleotide geometry was taken from 
FLEX force fieldel to provide compatibility with JUMNA 
programEa which was used for preparation of the initial 
duplex conformations. Except for sugar rings, the geom- 
etry of the nucleotides was fixed, that is the bases were 
rigid and all other bond lengths and bond angles were 
fixed. Calculations have been performed without explicit 
solvent by using the AMBER94 force fieldN with a di- 
electric constant e = r and phosphate charges in DNA 
reduced by 0.5. These conditions emulate the effects of 
ion condensation and provide a reasonably accurate apt 
proximation in conformational analysis of nucleic acids.E3 

Pyrrolidine and furanose rings are treated similarly by 
a single program. The furanose underlying tree has been 
detailed above. In the analogous construction for pro- 
lines the NC5 bond is broken and replaced by a distance 
constraint. Our approach generally allows for arbitrary 
freezing of internal coordinates and thus can consider nu- 
merous different representations of such ringsa. Here we 
consider a model with only bond lengths fixed and all 
intra-cyclic valence angles free. These choices leave sev- 
eral fast bond angle bending modes active, notably, the 
scissor H-C-H mode with a frequency of around 1500 
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FIG. 3. Characteristic time step dependencies of the total 
energy (a) and of the E-drift (b) for a decamer DNA duplex 
(TA)io. Low temperature plots are shown by thinner lines. 
For comparison, in (a) the low temperature deviations are 
scaled and shifted to fit the range of deviations observed with 
normal temperature. Similarly, in (b) the low temperature 
E-drift is multiplied by 100. The dashed horizontal lines in 
(a) show the band of acceptable deviation defined in the text. 

cm - 1 . Theory!! says that for a step size of 10 fsec, 
the maximum frequency should be 3 times lower. To 
achieve this, additional moments of inertia o£ 9 amu-A 2 
are applied to C-Ii-bopds as described earlier o Following 
earlier conclusionsoEa other hydrogen-only rigid bodies, 
like thymine methyls, have additional inertia of 4 amu-A 2 
added. 

With these modifications, the pseudorotation normal 
modes with frequencies around 550 and 640 cm -1 for 
DNA and collagen, respectively, become the fastest and 
they already correspond to a harmonic characteristic 
time step between 9 and 10 fsec for leapfrog-equivalent 
integrators.El However, with unfavorable collision angles, 
non-hydrogen ring atoms with all valence angles free have 
a considerably smaller effective ineptia than similar atoms 
in models with fixed bond angles.E^I In preliminary tests 
(not shown here) I observed an anharmonic limitation 
below 10 fsec at normal temperature which could be over- 
come by an additional increase of the ring inertia. Here 
I show results obtained with the moments of ring C-C 
and C-0 bonds increased by 15 amu-A 2 and 2 amu-A 2 , 
respectively. This should make the inertias of all ring 
bonds similar and approximately equal to that of a wa- 
ter molecule, with a 50% increase for C-C bonds. As 
a side effect, the fastest pseudorotation frequencies are 
shifted down below the already lowered hydrogen scissor 
modes, and the latter thus remain the fastest in both test 
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FIG. 4. Characteristic time step dependencies of the total 
energy (a) and of the E-drift (b) for a dodecamer collagen 
triple helix. Notation is same as in Fig. ^[ 



systems. 

Stability and step size limits were evaluated with the 
testing technique and cquilibratioxL protocols proposed 
and analyzed in detail elsewhere. E3'E3 In this method, 
the test trajectory is repeatedly calculated always start- 
ing from exactly the same constant-energy hypersurface. 
In each run, certain system averages are evaluated and 
compared with "ideal" values, i.e. the same parameters 
obtained with a very small time step. The choiccj-of such 
parameters has been discussed in detail earlier £3 Here 
we use only the total energy, E = U + K and its drift 
(.E-drift), where U and K are average potential and ki- 
netic energies computed for integer steps and half-steps, 
respectively. As in the previous studyE3 we take a devia- 
tion of 0.2D[U], where D[] denotes operator of variance, 
as the upper acceptable level for deviation of E. The step 
size maximum determined is denoted as h c and called 
"characteristic". She E-drift is exactly zero for ideal 
harmonic systems,c3 and is thus a good indicator of an- 
harmonic effects. Virtually harmonic conditions are sim- 
ulated by reducing the temperature dow_n to 0.1K with 
the same equilibration protocol as before.E3 Relevant har- 
monic frequencies were evaluated from low temperature 
spectral densities of autocorrelation functions of appro- 
priate generalized velocities. In all production runs the 
duration of the test trajectory was 10 psec. 

The results of such testing are shown in Figs. || and 
||. For both model systems the low and normal temper 
ature plots have characteristic qualitative differences]^ 
but h c does not change with temperature indicating that 
the time step limitations are harmonic. The h c values 
are close to the expected harmonic estimate. The small 



S 



difference observed between the two systems can be at- 
tributed to a three times larger number of hydrogen scis- 
sor modes in collagen. We see that the projection step 
in algorithm (^7|) does not deteriorate the high stabil- 
ity of the original leapfrog algorithm. We conclude also 
that our model for five-membered rings allows calcula- 
tions with h « 10 fsec. The last conclusion has been 
checked by computing several nanosecond trajectories of 
different DNA oligomers (results not shown). 

Plots in Figs. [|and [| are obtained by algorithm (27) 
without corrections of constrained bond lengths. Before 
considering the effects of such corrections let us look more 
carefully at how constraint distances behave in the above 
conditions. Since algorithm ( |27| ) keeps no information 
about the initial bond lengths a diffusive drift from initial 
values is possible. Note, however, that the constrained 
distances are just additional first integrals of the con- 
strained equations of motion, like momenta or the total 
energy. For leapfrog-equivalent integrators deviations in 
first integrals caused by approximation errors are nor- 
mally oscillatory rather than diffusive, and the drift may 
thus be small. 

Figure || shows the time variation of a CN5 bond in 
proline during a 1 nsec trajectory. This trajectory has 
been computed for a single terminally blocked residue at 
normal temperature with the same conditions as above. 
It is seen that the fluctuations have many time scales, but 
even in the slowest one they are not evidently diffusive. 
The insertion plot shows that the high frequency ampli- 
tude of the fluctuations levels starting from the very first 
time step. It scales as 0(h 2 ) in agreemeruLwith the gen- 
eral properties of leapfrog- like algorithmsEj (not shown) . 
The high-frequency amplitude is evidently larger than 
diffusive deviations accumulated for tens of picoseconds, 
and, therefore, infrequent periodical corrections should 
be sufficient to keep deviations within this range. Such 
a possibility is illustrated by the solid line in the same 
figure where the analytical ring closure was applied once 
every 10 psec. One may note that this gives a reasonable 
and certainly the safest correction strategy. 

Table | compares several correcting schemes discussed 
in the previous section. We noted above that numer- 
ous feedback algorithms are possible due to variation of 
two conditions. First, correction of generalized velocities 
can be added within the iterative cycle or after it. The 
last option, however, always yields a considerably higher 
E-drift and such algorithms are not included in Table |. 
Second, there are two possible directions of the correcting 
vector computed for on-step rings and half-step rings, re- 
spectively. Only one of the two directions, or an average 
of the two vectors, can be used, or else they can alternate 
between time steps. In the last case, however, a dramatic 
loss of the overall stability of the algorithm is induced. 
Thus, among many variants only three feedback strate- 
gies included in Table [j] give acceptable results. The data 
have been obtained for 10 psec trajectories of the DNA 
decamer with a 10 fsec step size. Trajectories started 
from the same initial state with sugar rings closed ex- 




400 600 
Time (psec) 



FIG. 5. Time variation of the constrained bond length in 
a separate terminally blocked proline residue during a 1 nsec 
trajectory. The dott ed p lot corresponds to a trajectory com- 
puted by algorithm (EJJ) with no corrections. Its initial part 
is detailed in the insertion. The solid line corresponds to a 
trajectory computed similarly, but with the analytical ring 
closure applied once in 10 psec. 



actly. 

We noted above that any correcting algorithm is likely 
to increase the E-drift. It appears, however, that the in- 
crease is usually below the noise level for the step sizes 
of interest, which is clearly seen in Table Q and Fig. § 
Except for half-step feedback corrections the E-drift is 
within the range of fluctuations in Fig. |^(b). A cer- 
tain increase in E-drift is observed, however, with much 
smaller time steps as well as with h > h c . Very good 
results are obtained with analytical ring closure, which 
seems to be the best choice for furanose and proline rings. 
It should be noted that, in this case, with any step size, it- 
erations in Eqs. (^) converge only up to a relative accu- 
racy of 10 -5 and then start looping. This, however, does 
not seem to affect either the accuracy in terms of energy 
conservation, or the long time stability of the algorithm, 
which also has been checked for nanosecond trajectories. 

Although feedback algorithms appear unnecessary for 
five-membered rings they still present significant interest 
especially for S-S bridges were both the analytical closure 
and the periodical corrections are not easy. Table | shows 
that, as expected, the three feedback algorithms improve 
the accuracy in the targeted rings. Improvements are not 
spectacular, but it is important to note that the corrected 
levels of deviations are stable in time. In this respect the 
last algorithm in Table |, which manages to correct both 
on-step and half-step distances, is the most promising. 
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TABLE I. The quality of the ring closure obtained with different correcting strategies. Data from 10 psec trajectories of 
(TA)b starting from the same initial state with all rings closed exactly. Time step 10 fsec. The rms deviations of closing bond 
lengths from ideal values are shown, with corresponding maximal values given in brackets. 



Algorithm 


on-step rings (xlO 3 A) 


half-step rings (xlO 3 A) 


E-drift (xlO 2 kcal/pscc) 




No correction 


1.50(11.3) 


0.757(6.34) 


-9.23 




Analytical closure 


0(0) 


0(0) 


3.49 




Feedback on-step 


0.294(1.84) 


1.52(7.81) 


5.71 




Feedback half-step 


1.53(12.0) 


0.369(2.90) 


-39.5 




Feedback mixed 


0.837(5.72) 


0.852(6.04) 


-8.94 





III. CONCLUSIONS 

This study is, to my knowledge, the first successful 
attempt to develop a practical ICMD approach to large 
molecules with internal flexible rings. It is shown here 
that the strategy referred to above as "inconsistent", 
that is imposing explicit constraints upon a system al- 
ready constrained implicitly, results in algorithms as fast 
and as stable as those for ICMD simulations of poly- 
mers with the tree topology. For the important case of 
five-membered rings in nucleic acids and proteins, calcu- 
lations with time steps around 10 fsec are shown to be 
possible. 
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